An Efficient Implementation of Stencil Communication for the XcalableMP PGAS Parallel Programming Language
نویسندگان
چکیده
Partitioned Global Address Space (PGAS) programming languages have emerged as a means by which to program parallel computers, which are becoming larger and more complicated. For such languages, regular stencil codes are still one of the most important goals. We implemented three methods of stencil communication in a compiler for a PGAS language XcalableMP, which are 1) based on derived-datatype messaging; 2) based on packing/unpacking, which is especially effective in multicore environments; and 3) experimental and based on one-sided communication on the K computer, where the RDMA function suitable for one-sided communication is available. We evaluated their performances on the K computer. As a result, we found that the first and second methods are effective under different conditions, and selecting either of these methods at runtime would be practical. We also found that the third method is promising but that the method of synchronization is a remaining problem for higher performance.
منابع مشابه
High-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library
The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA. This makes development of stencil applications a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high...
متن کاملLLVM Optimizations for PGAS Programs Case study: LLVMWide Pointer Optimizations in Chapel
PGAS programming languages such as Chapel, Coarray Fortran, Habanero-C, UPC and X10 [3–6, 8] support high-level and highly productive programming models for large-scale parallelism. Unlike messagepassing models such as MPI, which introduce nontrivial complexity due to message passing semantics, PGAS languages simplify distributed parallel programming by introducing higher level parallel languag...
متن کاملLeveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In ...
متن کاملOSPRI: An Optimized One-Sided Communication Runtime for Leadership-Class Machines
Partitioned Global Address Space (PGAS) programming models provide a convenient approach to implementing complex scientific applications by providing access to a large, globally accessible address space. This paper describes the design, implementation and performance of a new one-sided communication library that attempts to meet the needs of PGAS models, particularly Global Arrays, but hopefull...
متن کاملEffective use of the PGAS Paradigm: Driving Transformations and Self-Adaptive Behavior in DASH-Applications
DASH is a library of distributed data structures and algorithms designed for running the applications on modern HPC architectures, composed of hierarchical network interconnections and stratified memory. DASH implements a PGAS (partitioned global address space) model in the form of C++ templates, built on top of DART – a run-time system with an abstracted tier above existing one-sided communica...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013